Identifying Person Duplicates of Short Geographic Distance by Computer Matching

نویسنده

  • Thomas Mule
چکیده

The Census Bureau conducted evaluations of person duplication in Census 2000. Duplicates of short geographic distances were identified by both clerical and computer matching. The evaluations showed that for these short distance duplicates that the computer matching algorithms were not able to find all of the duplicates identified by the clerks. However, the computer matching algorithms in the previous evaluations were primarily developed to identify duplicates of longer distances. This report analyzes the potential of computer matching when the focus is on short distance duplicates. I used the Bureau's record linkage software to do the computer matching. Using SAS, I was able to compare the computer matching results to the clerical results. First, I attempted to identify groups of links with high concentrations of true duplicates. I used Enterprise Miner to generate decision trees for several approaches and compared their results. Second, I analyzed clerical duplicates that were not identified by the computer matching to try to identify any patterns in these cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

There is big amount of work on discovering duplicates in relational data; merely elite findings concentrate on duplication in additional multifaceted hierarchical structures. Electronic information is one of the key factors in several business operations, applications, and determinations, at the same time as an outcome, guarantee its superiority is necessary. Duplicates are several delegacy of ...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases

The problem of identifying approximately duplicate records in databases has previously been studied as record linkage, the merge/purge problem, hardening soft databases, and field matching. Most existing approaches have focused on efficient algorithms for locating potential duplicates rather than precise similarity metrics for comparing records. In this paper, we present a domain-independent me...

متن کامل

Matching of Polygon Objects by Optimizing Geometric Criteria

Despite the semantic criteria, geometric criteria have different performances on polygon feature matching in different vector datasets. By using these criteria for measuring the similarity of two polygons in all matchings, the same results would not have been obtained. To achieve the best matching results, the determination of optimal geometric criteria for each dataset is considered necessary....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003